Lecture 12 : Principal Components

نویسنده

Kamesh Munagala

چکیده

Consider the standard setting where we are given n points in d dimensions. Call these ~ x1, ~ x2, . . . , ~ xn. As before, our goal is to reduce the number of dimensions to a small number k. In principal component analysis (or PCA), we will model the data by a k-dimensional subspace, and find the subspace for which the error in this representation is smallest. Suppose k = 1. Then we want to approximate the data with a line. Assume the data is centered, so that ∑n i=1 ~ xi = 0, and that the line passes through the origin. Let the line correspond to direction ~ w, a unit vector. What is the error in approximating ~ xi with ~ w? We can use the perpendicular distance between the point ~ xi and the line represented by ~ w. It is easy to check that the perpendicular is given by ~ xi − (~ xi · ~ w)~ w, so that its squared length is

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS 168 : The Modern Algorithmic Toolbox Lecture # 8 : How PCA Works

Last lecture introduced the idea of principal components analysis (PCA). The definition of the method is, for a given data set and parameter k, to compute the k-dimensional subspace (through the origin) that minimizes the average squared distance between the points and the subspace, or equivalently that maximizes the variance of the projections of the data points onto the subspace. We talked ab...

متن کامل

CS 168 : The Modern Algorithmic Toolbox

Principal components analysis (PCA) is a basic and widely used technique for exploring data. If you go on to take specialized courses in machine learning or data mining, you’ll certainly hear more about it. The goal of this lecture is develop your internal mapping between the linear algebra used to describe the method and the simple geometry that explains what’s really going on. Ideally, after ...

متن کامل

Principal Fitted Components for Dimension Reduction in Regression

We provide a remedy for two concerns that have dogged the use of principal components in regression: (i) principal components are computed from the predictors alone and do not make apparent use of the response, and (ii) principal components are not invariant or equivariant under full rank linear transformation of the predictors. The development begins with principal fitted components [Cook, R. ...

متن کامل

Fisher Lecture: Dimension Reduction in Regression1,21,21,21,2

Beginning with a discussion of R. A. Fisher’s early written remarks that relate to dimension reduction, this article revisits principal components as a reductive method in regression, develops several model-based extensions and ends with descriptions of general approaches to model-based and model-free dimension reduction in regression. It is argued that the role for principal components and rel...

متن کامل